Intelligent Audio Production Strategies Informed by Best Practices
نویسندگان
چکیده
The main focus of this article is to explore and investigate the fundamental constraints that should be at the basis of algorithm development in intelligent audio production systems. Through mix analysis and grounded theory strategies, a best-practices framework on the craft of mixing is sought out. Findings, while not to be taken as dogmatic, give a clear indication of preferred implementation strategies, and show what still needs to be done to fully understand the technical choices that audio mixing has incorporated throughout its history. 1. CONTEXT The last five years have witnessed blooming of research in the field of automatic mixing [1], powered by crossadaptive digital audio algorithms [2]. Most of the developed strategies, while showing promising results, have mainly relied on the author’s experience, or on literature review where literature is exiguous. We argue that a more thorough exploration on what the premises are is essential for more effective mapping and design strategies in intelligent audio production tools. We have approached this by doing extensive work on the best practices of mixing, which culminated in the first author’s PhD thesis [3]. The text herein is a short summary of selected findings, highlighting those for which conclusions were strongly drawn. We are especially interested in conclusions that go against the stabilized assumptions found in previous research. This work resorts to approaches based on knowledge engineering (KE) [4], grounded theory (GT) [5] and machine learning (ML) [6]. KE seeks to integrate expert knowledge into computer systems for task solving that usually requires a high level of human expertise. For intelligent audio production we know the knowledge lies in the hands of top practitioners, but extracting it is not always trivial as practical sound engineering has moved away from a technical to an artistic field in the last half a century, and practitioners are often inclined to believe there is no knowledge implied. GT is a discipline that strives to systematically generate theory from data stemming from empirical research . For our case it means looking at complex psychoacoustic evaluation studies and extracting meaningful data out of listener preference. Finally, ML relies on the construction of systems that can learn from data. After a training step on a learning data set, the algorithm should be able to perform accurately on new examples. In the original work [3] we have first relied on literature review and an extensive interview process to crystalize upon 88 potential assumptions for how technical decisions in mixing are performed. Figure 1 highlights some assumptions of this work. These suppositions were examined and eventually validated by one of seven different strategies: 1. Measuring parameters from mixing sessions of successful songs. 2. Having successful sound engineers perform specifically tailored mixing exercises. 3. Measuring features from completed successful mixes. 4. Performing subjective listening tests on experienced subjects. 5. Analyzing through quantitative surveys the habits of successful mixing engineers. 6. Performing exploratory interviews with successful mixing engineers. AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29 1 PESTANA AND REISS Intelligent Audio Production Strategies Informed by Best Practices # Title Proven Origin Tested 01 All signals should be presented with equal loudness. False PI SE; Q 02 The main element should be up by an understandable amount of loudness units. True INT EX; MM; SE; Q 03 Vocals should be ridden above the backing track True INT; LIT EX; Q 04 No element should be able to mask any of the frequency content of the vocals. True INT; PI Q 05 Track panning affects partial loudness True LIT EX; SE 06 Dynamic Range Compression affects relative loudness choices. False INT SE 07 Low-end frequencies should be centrally panned. True LIT; INT; PI MM; SE 08 The main track is always panned centrally. True LIT; INT; PI MM 09 Remaining tracks are panned out of the center. True LIT; INT EX; MM; Q 10 The higher the frequency content the more a track can be panned sideways. False LIT; PI MM 11 Frequency balance should be kept between left and right. True LIT; INT; PI MM; Q 12 Hard panning should be avoided. False LIT; PI SE ; Q 13 Sources recorded with close (mono) and far (stereo) techniques simultaneously should have the mono source panned to the same perceived position featured in the stereo source. True INT Q 14 Monophonic compatibility should be kept. True LIT, INT MM; Q 15 Panning is mostly done audience-perspective. False LIT Q 16 It is customary to apply temporal cues to panning. False PI Q 17 Equalization is frequently done to avoid inter-track masking effects. True LIT; INT; PI EX; Q 18 Salient resonant frequencies should be subdued. True INT Q 19 High-pass filters should be used in all tracks with no significant low-frequency content. False LIT; PI SE; Q 20 There is a specific low-mid region that can be attenuated to improve clarity. False LIT SE ; Q 21 Expert mixers tend to cut more than boost. False LIT Q 22 High Q-factors should be used when cutting and low Q-factors when boosting. True LIT; INT Q 23 Equalization use should always be minimized. False LIT Q 24 Every song is unique in its spectral/timbral contour. True INT MM; Q 25 Reverb time is strongly dependent on song tempo. False INT SE ; Q 26 Reverb time is strongly dependent to an autocorrelation measure. True SE 27 Delay times are typically locked to song tempo. True LIT; INT SE ; Q 28 The pre-delay is timed as a multiple of the subdivided song tempo. True LIT; INT SE ; Q 29 The level of the reverb returns is on average set to a specific amount of loudness lower than the direct sound. True SE 30 Low-end frequencies are less tolerant of reverb and delay. True LIT; INT EX; Q 31 Transients are less tolerant of reverb and delay. True LIT; INT EX; Q 32 The sends into the reverbs should be equalized. True INT Q 33 Reverbs can be carefully substituted by delays to lessen masking effects. True INT SE; Q 34 Compression takes place whenever a source track varies too much in loudness. True LIT; INT EX; SE; Q 35 Compression takes place whenever headroom is at stake, and the low-end is usually more critical. True INT MM; EX; SE; Q 36 Gentle bus/mix compression helps blend things better. True LIT; INT SE; Q 37 There is an optimal amount of compression in terms of dB and it depends on sound source features. True LIT EX; Q 38 Compression should not be overused and there are maximum values for it. False LIT EX; Q 39 ? Compressor attack is set up so that only the transient goes through. False LIT EX; Q 40 Compressor release is set up so that it is over when the next note is about to start. False LIT EX; Q 41 It is acceptable to judiciously lop off some micro-burst transients to gain peak-to-RMS space. True SE ; Q 42 In deciding a tracks dynamic profile, an expert engineer will shift the focus of the listener by enhancing different tracks over time, with volume changes that may some times be quite big. True INT EX; Q 1 Fig. 1: Selected assumption overview. The origin of the assumption can either be literature review (LIT), the interview process with professionals (INT), or the assumption made on previous implementations (PI). The method of testing is either through mixing exercises by professionals (EX), measuring number one hit singles for features (MM), subjective evaluation with a listening panel (SE) or a questionnaire sent to professionals (Q). AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29 Page 2 of 9 PESTANA AND REISS Intelligent Audio Production Strategies Informed by Best Practices 7. Using literature review. This is a purposely ordered list, as each element yields more robust conclusions than those that succeed it. Options 1 and 2 grant us objective, quantifiable access to the workings of the mind of successful engineers performing successful mixes. Option 3 is equally robust, but may be tainted by mastering and conversion practices, and is limited in the scope of assumptions it can prove. Option 4 is not as objective in nature, but if performed on a large enough scale with experienced subjects, can give a good estimation of best practices. Option 5 and 6 introduce the problems of bias and status that arise from the sharing of private methodologies, 5 having the advantage of being quantifiable. Finally 7 is considered the less revealing because technical literature in mixing is scarce and written by authors that are not as successful in the craft of mixing as those in options 5 and 6. In our research [3] we were able to have contributions from nearly 60 successful professional sound engineers (the criteria were having mixed number one albums or singles or having won a prestigious award for sound engineering), to build a panel of up to 70 listeners for subjective evaluation purposes, and to extract information from a dataset of over 900 songs that were number one singles in either the UK or the US. We performed over 20 subjective evaluation tests, had over 100 interviews, and examined a 49-question survey directed at the almost 60 experts. We shall now explore significant conclusions on topics of loudness (Section 2), panning (Section 3), equalization (Section 4), temporal processing (Section 5) and dynamic range control (Section 6). We then move on to a broader overview of our conclusions, looking at potential areas for further work.
منابع مشابه
Use of Evidence-informed Deliberative Processes by Health Technology Assessment Agencies Around The Globe
Background Evidence-informed deliberative processes (EDPs) were recently introduced to guide health technology assessment (HTA) agencies to improve their processes towards more legitimate decision-making. The EDP framework provides guidance that covers the HTA process, ie, contextual factors, installation of an appraisal committee, selecting health technologies and criteria, assessment, a...
متن کاملInventory Management Practices and Operational Performance of Flour Milling Firms in Lagos, Nigeria
This study examines inventory management practices of flour milling manufacturing firms and their effects on operational performance. Five flour milling manufacturing firms in Lagos were used for this study. Structured questionnaire was the major instrument for the collection of relevant primary data while descriptive statistics such as mean and standard deviation was deployed to analyzing the ...
متن کاملAgent-based Consumer Learning in E-Commerce
In the age of information and knowledge, the role of business companies is not only just to sell products and services, but also to educate their consumers. Educating consumers with extremely diversified backgrounds about technical knowledge is an important part of competitive advantage. To succeed, modern companies need to develop the best education technologies and teaching philosophies. The ...
متن کاملIntegrating Business Sustainability into Supply Chain Management
Companies today face the challenge of adopting proper supply chain sustainability (SCS) strategies and practices to respond effectively to emerging global sustainability initiatives. Business sustainability has become a strategic imperative, with a focus on both financial and non-financial sustainability performance, which creates shared value for all stakeholders. This paper examines the integ...
متن کاملIntelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields
We propose a machine learning approach based on hinge-loss Markov random fields to solve the problem of applying reverb automatically to a multitrack session. With the objective of obtaining perceptually meaningful results, a set of Probabilistic Soft Logic (PSL) rules has been defined based on best practices recommended by experts. These rules have been weighted according to the level of confi...
متن کامل